I conducted a psychological evaluation with students of CHRIST (Deemed to be University) Pune, Lavasa Campus as the target population for the same. The evaluation aimed at calculating the Resilience and Social Support Score for individuals.
Link to the form: https://forms.gle/XKhTci8azE9GVMkT9
Social Support: A network of family, friends, neighbors, and community members that is available in times of need to give psychological, physical, and financial help.
Resilience: The ability to cope with and recover from setbacks.
After the scores were calculated, suitable type of analysis was done on the data to predict the Resilience Score from Social Support variables. The project primarily focuses on sampling techniques and prediction/estimmation. The Questionnaire selected for this evaluation has already been used in publications and have a valid and verified scoring technique.
The links to the questionnaire and their respective scoring techniques are listed below:
Collect data from students and calculate their Resilience and Social Support Score. The data collected is considered to be noisy and hence, imply various functions offered by libraries to clean the same and gain meaningful insightfuls.
# Importing all the necessary libraries/modules
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from krishKiLibrary import countUnique
import plotly.express as px
from sklearn.preprocessing import LabelEncoder
from sklearn.linear_model import LinearRegression
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_error
df = pd.read_csv("D:/Z/CU/SEM3/Data Analytics/CIA/EndSem/Datasets/21112016_KrishAgarwal_CleanedDataset.csv")
df = df.drop([df.columns[0]], axis = 1) # Dropping unnecessary columns
df
| Age | Gender | State | SS1 | SS2 | SS3 | SS4 | SS5 | SS6 | SS7 | ... | R7 | R8 | R9 | R10 | R11 | R12 | SS Score | SS Status | R Score | R Status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 19 | Female | Telangana | 5 | 3 | 6 | 5 | 4 | 6 | 5 | ... | 1 | 1 | 1 | 3 | 1 | 1 | 4.916667 | Moderate | 18 | Developing |
| 1 | 19 | Female | Jharkhand | 4 | 6 | 6 | 6 | 6 | 6 | 6 | ... | 2 | 3 | 3 | 3 | 3 | 3 | 5.833333 | High | 38 | Established |
| 2 | 19 | Male | Chhattisgarh | 4 | 5 | 7 | 6 | 4 | 5 | 5 | ... | 4 | 3 | 5 | 5 | 4 | 4 | 5.250000 | High | 52 | Exceptional |
| 3 | 19 | Male | Maharashtra | 5 | 6 | 6 | 6 | 6 | 6 | 6 | ... | 1 | 2 | 3 | 4 | 4 | 3 | 5.250000 | High | 35 | Developing |
| 4 | 19 | Male | Andhra Pradesh | 6 | 4 | 6 | 4 | 4 | 4 | 4 | ... | 3 | 3 | 3 | 2 | 3 | 4 | 4.333333 | Moderate | 36 | Developing |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 103 | 21 | Male | West Bengal | 5 | 5 | 6 | 5 | 1 | 1 | 1 | ... | 1 | 2 | 2 | 1 | 2 | 2 | 2.750000 | Low | 23 | Developing |
| 104 | 20 | Male | Assam | 7 | 4 | 6 | 4 | 6 | 6 | 6 | ... | 3 | 1 | 4 | 5 | 4 | 3 | 5.416667 | High | 50 | Exceptional |
| 105 | 19 | Male | Gujarat | 7 | 7 | 7 | 7 | 7 | 7 | 7 | ... | 5 | 5 | 5 | 5 | 5 | 5 | 7.000000 | High | 60 | Exceptional |
| 106 | 16 | Male | Gujarat | 1 | 1 | 1 | 1 | 1 | 1 | 5 | ... | 5 | 5 | 5 | 5 | 5 | 5 | 2.083333 | Low | 40 | Established |
| 107 | 17 | Female | Delhi | 7 | 6 | 5 | 5 | 5 | 4 | 3 | ... | 1 | 2 | 4 | 4 | 2 | 4 | 4.916667 | Moderate | 44 | Strong |
108 rows × 31 columns
The dataframe consists of 108 entries in total which were collected from students pursuing their degrees at CHRIST (Deemed to be University) Pune, Lavasa Campus. The data is spread across 20 different states.
The columns 'SS' and 'R' stand for questions from Social Support and Resilience questionnaires respectively. Both contains of 12 features each. The Final Scores and their categories have been computed from the scoring techniques mentioned in their questionnaires.
Population Size: 108
Sample Size (Pre-Defined): 30
Elements from Each Stratum: $n_i = n\frac{N_i}{N}$
stratas = countUnique(df, df['R Status'].unique(), 'R Status')
print('Stratas:', stratas)
'''
n1 --> Developing
n2 --> Established
n3 --> Exceptional
n4 --> Strong
'''
Stratas: {'Developing': 39, 'Established': 35, 'Exceptional': 12, 'Strong': 22}
'\nn1 --> Developing\nn2 --> Established\nn3 --> Exceptional\nn4 --> Strong\n'
def propStratifiedSampling(df, column, sample_size):
import pandas as pd
# Count of elements in each category
stratas_ = {}
for i in range(len(df[column].unique())):
k = 0
for j in range(len(df[column])):
if df[column][j] == df[column].unique()[i]:
k += 1
stratas_[df[column].unique()[i]] = k
# Defining pop_size and additional dataframe
population_size = len(df)
sample_df = pd.DataFrame()
# Calculating number of elements from each stratum to be taken into sample
for i in range(len(stratas_)):
n_i = round(sample_size*(list(stratas_.values())[i]/population_size))
# adding n_i no.of random elements from stratums into sample dataframe
df_ = df[df[column] == list(stratas_.keys())[i]].sample(frac = 1)[0:n_i]
sample_df = pd.concat([sample_df, df_])
# Shuffling the dataframe
sample_df = sample_df.sample(frac = 1)
# Resetting the Index
sample_df.reset_index(inplace = True, drop = True)
return sample_df
sample = propStratifiedSampling(df, 'R Status', 30)
sample
| Age | Gender | State | SS1 | SS2 | SS3 | SS4 | SS5 | SS6 | SS7 | ... | R7 | R8 | R9 | R10 | R11 | R12 | SS Score | SS Status | R Score | R Status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 18 | Male | Tamil Nadu | 5 | 3 | 5 | 4 | 3 | 4 | 3 | ... | 1 | 5 | 3 | 4 | 2 | 1 | 3.833333 | Moderate | 32 | Developing |
| 1 | 19 | Male | Maharashtra | 5 | 6 | 6 | 6 | 6 | 6 | 6 | ... | 1 | 2 | 3 | 4 | 4 | 3 | 5.250000 | High | 35 | Developing |
| 2 | 19 | Male | Tamil Nadu | 4 | 4 | 4 | 4 | 4 | 4 | 4 | ... | 3 | 3 | 4 | 4 | 4 | 3 | 4.000000 | Moderate | 46 | Strong |
| 3 | 19 | Female | Rajasthan | 7 | 6 | 6 | 7 | 7 | 5 | 5 | ... | 1 | 4 | 4 | 5 | 3 | 3 | 6.083333 | High | 35 | Developing |
| 4 | 19 | Female | Uttar Pradesh | 5 | 5 | 5 | 5 | 5 | 5 | 5 | ... | 4 | 1 | 3 | 4 | 3 | 3 | 5.000000 | Moderate | 36 | Developing |
| 5 | 19 | Others | Nagaland | 1 | 2 | 2 | 4 | 1 | 4 | 1 | ... | 2 | 2 | 3 | 3 | 2 | 5 | 1.666667 | Low | 35 | Developing |
| 6 | 19 | Female | Maharashtra | 4 | 5 | 5 | 3 | 4 | 6 | 6 | ... | 2 | 3 | 3 | 4 | 4 | 3 | 4.750000 | Moderate | 39 | Established |
| 7 | 20 | Male | Gujarat | 5 | 6 | 6 | 6 | 6 | 6 | 2 | ... | 2 | 4 | 2 | 4 | 2 | 4 | 4.583333 | Moderate | 38 | Established |
| 8 | 20 | Male | Bihar | 3 | 2 | 6 | 6 | 2 | 2 | 2 | ... | 3 | 3 | 3 | 3 | 3 | 3 | 3.416667 | Moderate | 36 | Developing |
| 9 | 20 | Male | Bihar | 4 | 2 | 5 | 5 | 5 | 1 | 5 | ... | 2 | 4 | 5 | 3 | 3 | 4 | 4.666667 | Moderate | 35 | Developing |
| 10 | 19 | Male | West Bengal | 5 | 6 | 6 | 6 | 6 | 6 | 6 | ... | 4 | 4 | 4 | 4 | 4 | 4 | 5.916667 | High | 46 | Strong |
| 11 | 18 | Male | Delhi | 7 | 6 | 6 | 6 | 6 | 6 | 6 | ... | 2 | 2 | 4 | 4 | 2 | 3 | 5.583333 | High | 41 | Established |
| 12 | 23 | Male | Maharashtra | 7 | 7 | 7 | 6 | 7 | 6 | 6 | ... | 5 | 1 | 3 | 4 | 5 | 5 | 6.500000 | High | 49 | Exceptional |
| 13 | 23 | Female | Uttar Pradesh | 4 | 4 | 4 | 6 | 4 | 6 | 2 | ... | 1 | 5 | 4 | 3 | 5 | 3 | 4.500000 | Moderate | 36 | Developing |
| 14 | 18 | Male | Tamil Nadu | 2 | 2 | 6 | 5 | 2 | 2 | 2 | ... | 2 | 2 | 4 | 2 | 4 | 4 | 3.083333 | Moderate | 42 | Established |
| 15 | 19 | Female | Puducherry | 4 | 7 | 2 | 4 | 7 | 4 | 3 | ... | 3 | 4 | 3 | 4 | 4 | 4 | 4.083333 | Moderate | 44 | Strong |
| 16 | 19 | Male | Rajasthan | 4 | 4 | 4 | 3 | 3 | 6 | 7 | ... | 5 | 1 | 5 | 5 | 5 | 5 | 4.500000 | Moderate | 56 | Exceptional |
| 17 | 19 | Male | Telangana | 1 | 1 | 5 | 5 | 1 | 6 | 6 | ... | 5 | 3 | 4 | 3 | 4 | 1 | 3.666667 | Moderate | 39 | Established |
| 18 | 16 | Female | Gujarat | 3 | 5 | 4 | 4 | 5 | 5 | 3 | ... | 2 | 2 | 2 | 4 | 3 | 3 | 4.000000 | Moderate | 33 | Developing |
| 19 | 22 | Male | Maharashtra | 3 | 3 | 6 | 5 | 7 | 7 | 6 | ... | 4 | 3 | 4 | 4 | 4 | 5 | 5.250000 | High | 49 | Exceptional |
| 20 | 16 | Male | Gujarat | 6 | 5 | 6 | 3 | 6 | 6 | 7 | ... | 4 | 3 | 5 | 4 | 4 | 3 | 5.500000 | High | 43 | Established |
| 21 | 19 | Male | Maharashtra | 4 | 5 | 5 | 5 | 5 | 5 | 1 | ... | 3 | 3 | 3 | 4 | 4 | 4 | 4.416667 | Moderate | 44 | Strong |
| 22 | 16 | Male | Gujarat | 1 | 1 | 1 | 1 | 1 | 1 | 5 | ... | 5 | 5 | 5 | 5 | 5 | 5 | 2.083333 | Low | 40 | Established |
| 23 | 16 | Female | Maharashtra | 5 | 5 | 6 | 5 | 6 | 6 | 6 | ... | 3 | 2 | 3 | 3 | 2 | 3 | 5.666667 | High | 32 | Developing |
| 24 | 18 | Female | Tamil Nadu | 6 | 4 | 4 | 5 | 6 | 5 | 6 | ... | 4 | 5 | 3 | 2 | 5 | 4 | 5.083333 | High | 43 | Established |
| 25 | 18 | Male | Gujarat | 5 | 7 | 3 | 1 | 6 | 7 | 7 | ... | 2 | 4 | 4 | 1 | 1 | 3 | 5.416667 | High | 36 | Developing |
| 26 | 49 | Female | Gujarat | 3 | 7 | 6 | 6 | 7 | 6 | 6 | ... | 4 | 2 | 4 | 5 | 4 | 4 | 5.833333 | High | 42 | Established |
| 27 | 21 | Female | Gujarat | 6 | 7 | 7 | 7 | 7 | 6 | 6 | ... | 3 | 5 | 3 | 4 | 5 | 4 | 6.500000 | High | 46 | Strong |
| 28 | 18 | Male | Telangana | 5 | 7 | 5 | 5 | 7 | 7 | 7 | ... | 4 | 2 | 3 | 4 | 4 | 4 | 6.250000 | High | 45 | Strong |
| 29 | 19 | Female | Gujarat | 5 | 5 | 2 | 2 | 5 | 6 | 5 | ... | 2 | 3 | 4 | 5 | 2 | 2 | 4.166667 | Moderate | 43 | Established |
30 rows × 31 columns
# --- 1) Finding outliers via a boxplot in the 'Age' column ---
sns.set(rc={'figure.figsize':(10, 5)})
sns.boxplot(x = sample["Age"]).set(title='Age Distribution')
[Text(0.5, 1.0, 'Age Distribution')]
Inference: We find that there are outliers which do not fall in our area of research. So, we deal with each of them accordingly and drop the values which are supposed to be dropped.
# --- 2) Count of elemets in each column ---
col_list = df.drop(['Gender', 'State', 'SS Status', 'R Status'], axis = 1)
plt.figure(figsize=(20,20))
column_list = col_list[2:]
plt_num = 1
for i in column_list:
if plt_num<=18:
plt.subplot(6, 6, plt_num)
sns.histplot(df[i])
plt_num = plt_num+1
else:
plt.subplot(6, 6, plt_num)
sns.histplot(df[i])
plt_num = plt_num+1
plt.tight_layout()
Inference: From the above graph, we get a rough idea about the count in each column.¶
# --- 3) Percentage of people in each State ---
state_unique = list(sample['State'].unique())
state_data = []
for i in range(len(state_unique)):
state_data.append(len(sample[sample['State'] == state_unique[i]]))
# Wedge properties
wp = { 'linewidth' : 1, 'edgecolor' : "green" }
# Creating autocpt arguments
def func(pct, allvalues):
absolute = int(pct / 100.*np.sum(allvalues))
return "{:.1f}%\n({:d} g)".format(pct, absolute)
# Creating plot
fig, ax = plt.subplots(figsize =(30, 15))
wedges, texts, autotexts = ax.pie(state_data,
autopct = lambda pct: func(pct, state_data),
labels = state_unique,
shadow = True,
startangle = 90,
wedgeprops = wp,
textprops = dict(color ="black"))
# Adding legend
ax.legend(wedges, state_data,
title = "States",
loc ="center left",
bbox_to_anchor =(1, 0, 0.5, 1))
plt.setp(autotexts, size = 8, weight = "bold")
ax.set_title("Percentage of students in each State")
# show plot
plt.show()
Inference: We can see that majority of the people who filled the form belong to Gujarat which is followed up people from Maharashtra.
# --- 4) Number of people belonging to each gender ---
plt.hist(sample['Gender'])
plt.title('Gender Count')
Text(0.5, 1.0, 'Gender Count')
Inference: We can see that there is a majority of Males followed by Females and lastly others.
# --- 5) Percentage of People belonging in Each Social Support Category ---
ss_status_unique = list(sample['SS Status'].unique())
ss_status_data = []
for i in range(len(ss_status_unique)):
ss_status_data.append(len(sample[sample['SS Status'] == ss_status_unique[i]]))
# Creating plot
fig = plt.figure(figsize =(10, 7))
plt.pie(ss_status_data, labels = ss_status_unique, autopct='%1.1f%%')
# show plot
plt.title('Distribution of Students in Social Support Categories')
plt.show()
Inference: Majority of the students fall under the 'Moderate' category in their Social Support domain.
# --- 6) Percentage of People belonging in Each Resilience Category ---
r_status_unique = list(sample['R Status'].unique())
r_status_data = []
for i in range(len(r_status_unique)):
r_status_data.append(len(sample[sample['R Status'] == r_status_unique[i]]))
# Creating plot
fig = plt.figure(figsize =(10, 7))
plt.pie(r_status_data, labels = r_status_unique, autopct='%1.1f%%')
# show plot
plt.show()
Inference: We infer from the Graph that majority of them belong in the 'Developing' domain.
# --- 7) Correlation Heatmap ---
sns.set(rc={'figure.figsize':(25, 10)})
corr_map = sns.heatmap(sample.corr().round(2), annot=True)
# --- 8) SS Score vs R Score ---
sns.regplot(data = df, x = 'SS Score', y = 'R Score').set(title = 'SS Score vs R Score')
[Text(0.5, 1.0, 'SS Score vs R Score')]
Inference: It is observed that there is a slight linear relation between 'SS Score' and 'R Score'. (The same was hinted by the correlation heatmap)
# --- 9) SS Status vs Gender ---
gender_unique = list(sample['Gender'].unique())
gender_ss_label = ['Female_Mod', "Male_Mod", "Others_Mod", "Female_High", "Male_High", "Others_High", "Female_Low", "Male_Low", "Others_Low"]
gender_ss_data = []
for i in range(len(ss_status_unique)):
for j in range(len(gender_unique)):
gender_ss_data.append(len(sample[(sample['SS Status'] == ss_status_unique[i]) & (sample['Gender'] == gender_unique[j])]))
# Plot
sns.swarmplot(x = gender_ss_data, y = gender_ss_label, palette = "deep")
<AxesSubplot:>
Inference:
Low Status count: Male > Female = Others
Moderate Status count: Female > Male > Others
High Status count: Female > Male > Others
# --- 10) R Status vs Gender ---
gender_r_label = ['Female_Dev', "Male_Dev", "Others_Dev", "Female_Est", "Male_Est", "Others_Est", "Female_Exc", "Male_Exc", "Others_Exc", "Female_Str", "Male_Str", "Others_Str"]
gender_r_data = []
for i in range(len(r_status_unique)):
for j in range(len(gender_unique)):
gender_r_data.append(len(sample[(sample['R Status'] == r_status_unique[i]) & (df['Gender'] == gender_unique[j])]))
sns.swarmplot(x = gender_r_data, y = gender_r_label, palette = "deep").set(title = 'R Status vs Gender')
C:\Users\KRISH\AppData\Local\Temp\ipykernel_27516\177005251.py:7: UserWarning: Boolean Series key will be reindexed to match DataFrame index. gender_r_data.append(len(sample[(sample['R Status'] == r_status_unique[i]) & (df['Gender'] == gender_unique[j])])) C:\Users\KRISH\AppData\Local\Temp\ipykernel_27516\177005251.py:7: UserWarning: Boolean Series key will be reindexed to match DataFrame index. gender_r_data.append(len(sample[(sample['R Status'] == r_status_unique[i]) & (df['Gender'] == gender_unique[j])])) C:\Users\KRISH\AppData\Local\Temp\ipykernel_27516\177005251.py:7: UserWarning: Boolean Series key will be reindexed to match DataFrame index. gender_r_data.append(len(sample[(sample['R Status'] == r_status_unique[i]) & (df['Gender'] == gender_unique[j])])) C:\Users\KRISH\AppData\Local\Temp\ipykernel_27516\177005251.py:7: UserWarning: Boolean Series key will be reindexed to match DataFrame index. gender_r_data.append(len(sample[(sample['R Status'] == r_status_unique[i]) & (df['Gender'] == gender_unique[j])])) C:\Users\KRISH\AppData\Local\Temp\ipykernel_27516\177005251.py:7: UserWarning: Boolean Series key will be reindexed to match DataFrame index. gender_r_data.append(len(sample[(sample['R Status'] == r_status_unique[i]) & (df['Gender'] == gender_unique[j])])) C:\Users\KRISH\AppData\Local\Temp\ipykernel_27516\177005251.py:7: UserWarning: Boolean Series key will be reindexed to match DataFrame index. gender_r_data.append(len(sample[(sample['R Status'] == r_status_unique[i]) & (df['Gender'] == gender_unique[j])])) C:\Users\KRISH\AppData\Local\Temp\ipykernel_27516\177005251.py:7: UserWarning: Boolean Series key will be reindexed to match DataFrame index. gender_r_data.append(len(sample[(sample['R Status'] == r_status_unique[i]) & (df['Gender'] == gender_unique[j])])) C:\Users\KRISH\AppData\Local\Temp\ipykernel_27516\177005251.py:7: UserWarning: Boolean Series key will be reindexed to match DataFrame index. gender_r_data.append(len(sample[(sample['R Status'] == r_status_unique[i]) & (df['Gender'] == gender_unique[j])])) C:\Users\KRISH\AppData\Local\Temp\ipykernel_27516\177005251.py:7: UserWarning: Boolean Series key will be reindexed to match DataFrame index. gender_r_data.append(len(sample[(sample['R Status'] == r_status_unique[i]) & (df['Gender'] == gender_unique[j])])) C:\Users\KRISH\AppData\Local\Temp\ipykernel_27516\177005251.py:7: UserWarning: Boolean Series key will be reindexed to match DataFrame index. gender_r_data.append(len(sample[(sample['R Status'] == r_status_unique[i]) & (df['Gender'] == gender_unique[j])])) C:\Users\KRISH\AppData\Local\Temp\ipykernel_27516\177005251.py:7: UserWarning: Boolean Series key will be reindexed to match DataFrame index. gender_r_data.append(len(sample[(sample['R Status'] == r_status_unique[i]) & (df['Gender'] == gender_unique[j])])) C:\Users\KRISH\AppData\Local\Temp\ipykernel_27516\177005251.py:7: UserWarning: Boolean Series key will be reindexed to match DataFrame index. gender_r_data.append(len(sample[(sample['R Status'] == r_status_unique[i]) & (df['Gender'] == gender_unique[j])]))
[Text(0.5, 1.0, 'R Status vs Gender')]
Inference:
Developing Status count: Male > Female > Others
Established Status count: Female > Male > Others
Exceptional Status count: Male > Female > Others
Strong Status count: Male > Female > Others
# --- 11) SS Score vs State ---
sns.set(rc={'figure.figsize':(40, 10)})
sns.set_context("paper", font_scale=2)
bar1 = sns.barplot(data = sample, x = "State", y = "SS Score", ci = None)
for item in bar1.get_xticklabels():
item.set_rotation(45)
bar1.set(title = 'SS Status vs State')
[Text(0.5, 1.0, 'SS Status vs State')]
Inference: It is observed that people from 'West Bengal' have high Social Support Score.
# --- 12) R Score vs State ---
sns.set(rc={'figure.figsize':(40, 10)})
sns.set_context("paper", font_scale=2)
bar2 = sns.barplot(data = sample, x = "State", y = "R Score", ci = None)
for item in bar2.get_xticklabels():
item.set_rotation(45)
bar2.set(title = "R Score vs State")
[Text(0.5, 1.0, 'R Score vs State')]
Inference: It is observed that people from 'West Bengal' have high Resilience Score.
# --- 13) R Score vs State vs Gender ---
bar3 = sns.catplot(data = sample, x = "State", y = "R Score", hue = 'Gender' , height = 9, aspect = 4, ci = None)
bar3.set_xticklabels(rotation=30)
bar3.set(title = 'R Score vs State vs Gender')
<seaborn.axisgrid.FacetGrid at 0x24b9e9af6a0>
Inference: It is observed that the R Score belongs to a Male from Rajasthan whereas the lowest R Score belongs to a 1 Male from Tamil Nadu and 2 Female from Maharashtra and Gujarat.
# --- 14) SS Score vs State vs Gender ---
bar4 = sns.catplot(data = sample, x = "State", y = "SS Score", hue = 'Gender' , height = 9, aspect = 4, ci = None)
bar4.set_xticklabels(rotation=30)
bar4.set(title = 'SS Score vs State vs Gender')
<seaborn.axisgrid.FacetGrid at 0x24b9ee00fa0>
Inference: It is observed that the highest SS Score belongs to a Female from Gujarat whereas the lowest SS Score belongs to a Male from Nagaland.
# --- 17) R Score on the Indian map ---
fig = px.choropleth(
sample,
geojson = "https://gist.githubusercontent.com/jbrobst/56c13bbbf9d97d187fea01ca62ea5112/raw/e388c4cae20aa53cb5090210a42ebb9b765c0a36/india_states.geojson",
featureidkey = 'properties.ST_NM',
locations = 'State',
color = 'R Score',
color_continuous_scale='Reds',
#mapbox_style="carto-positron",
)
fig.update_geos(fitbounds="locations", visible=False)
fig.show()
Inference: The above graph shows the count of R Score in each State.
# --- 18) SS Score on the Indian map ---
fig = px.choropleth(
sample,
geojson = "https://gist.githubusercontent.com/jbrobst/56c13bbbf9d97d187fea01ca62ea5112/raw/e388c4cae20aa53cb5090210a42ebb9b765c0a36/india_states.geojson",
featureidkey = 'properties.ST_NM',
locations = 'State',
color = 'SS Score',
color_continuous_scale='Reds',
# mapbox_style="carto-positron",
)
fig.update_geos(fitbounds="locations", visible=False)
fig.show()
Inference: The above graph shows the count of SS Score in each State.
Predicting Resilience Score from Social Support variables (SS1, SS2...SS12, SS Score).
df.head()
| Age | Gender | State | SS1 | SS2 | SS3 | SS4 | SS5 | SS6 | SS7 | ... | R7 | R8 | R9 | R10 | R11 | R12 | SS Score | SS Status | R Score | R Status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 19 | Female | Telangana | 5 | 3 | 6 | 5 | 4 | 6 | 5 | ... | 1 | 1 | 1 | 3 | 1 | 1 | 4.916667 | Moderate | 18 | Developing |
| 1 | 19 | Female | Jharkhand | 4 | 6 | 6 | 6 | 6 | 6 | 6 | ... | 2 | 3 | 3 | 3 | 3 | 3 | 5.833333 | High | 38 | Established |
| 2 | 19 | Male | Chhattisgarh | 4 | 5 | 7 | 6 | 4 | 5 | 5 | ... | 4 | 3 | 5 | 5 | 4 | 4 | 5.250000 | High | 52 | Exceptional |
| 3 | 19 | Male | Maharashtra | 5 | 6 | 6 | 6 | 6 | 6 | 6 | ... | 1 | 2 | 3 | 4 | 4 | 3 | 5.250000 | High | 35 | Developing |
| 4 | 19 | Male | Andhra Pradesh | 6 | 4 | 6 | 4 | 4 | 4 | 4 | ... | 3 | 3 | 3 | 2 | 3 | 4 | 4.333333 | Moderate | 36 | Developing |
5 rows × 31 columns
# Pre-processing Gender columnn through LabelEncoder
gender_le = LabelEncoder()
df['Gender'] = gender_le.fit_transform(df['Gender'])
# 0 --> Female
# 1 --> Male
# Pre-processing Gender columnn through LabelEncoder
state_le = LabelEncoder()
df.State = state_le.fit_transform(df.State)
# Defining Dependable and Independable variables
X = df.drop(['R1', 'R2', 'R3', 'R4', 'R5', 'R6', 'R7', 'R8', 'R9', 'R10', 'R11', 'R12','SS Status', 'R Status'], axis = 1)
y = df['R Score']
# Splitting the data into training data and testing data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 5)
linear_regressor = LinearRegression()
# Fitting the Data
linear_regressor.fit(X_train, y_train)
# Predicting the Target Variable
y_pred = linear_regressor.predict(X_test)
# Plotting the true and predicted value
fig = plt.figure(figsize =(20, 10))
sns.regplot(y_test, y_pred)
plt.xlabel("R Score Test")
plt.ylabel("R Score Predicted")
plt.title("True Value vs Predicted Value")
plt.show()
C:\Users\KRISH\anaconda3\lib\site-packages\seaborn\_decorators.py:36: FutureWarning: Pass the following variables as keyword args: x, y. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation.
# Evaluation Metrics
print("The accuracy of the model is", linear_regressor.score(X_test, y_test))
print("The r2 score of the model is", r2_score(y_test, y_pred))
print("The Absolute Mean Error of the model is", mean_absolute_error(y_test, y_pred))
print("The Mean Squared Error of the model is", mean_squared_error(y_test, y_pred))
The accuracy of the model is 1.0 The r2 score of the model is 1.0 The Absolute Mean Error of the model is 7.536059318667729e-15 The Mean Squared Error of the model is 8.41451632235746e-29
print("The intercept of the model is", linear_regressor.intercept_, "\n\nThe weights of the features are as follows:\n", linear_regressor.coef_)
The intercept of the model is -3.552713678800501e-14 The weights of the features are as follows: [-6.18547845e-18 1.14383311e-15 -1.85103699e-16 -1.72212377e-15 -2.95503785e-16 -3.04147391e-16 -2.50319851e-16 -4.97898256e-16 2.58678808e-16 -5.11337749e-17 3.22046882e-16 -1.33074664e-16 9.20689422e-16 -1.92055263e-16 -1.00830170e-16 -1.61878490e-16 1.00000000e+00]
# Equation of the model
def predictRScore(regressor, x):
intercept = regressor.intercept_
weights = regressor.coef_
equation = 0
for i in range(len(weights)):
equation += (weights[i]*x[i])
equation += intercept
print("The R Score is " + str(equation))